EC2

Instances Types

  • R (RAM): in-memory caches
  • C (CPU): compute / databases
  • M: general purpose / web app
  • I (I/O): databases
  • G (GPU): video rendering / machine learning
  • T2 / T3 (burstable instances)

Placement Groups

  • Control EC2 instance placement strategy
  • Group strategies:
    • Cluster: clusters instances into a low-latency group in a single AZ
    • Spread: spreads instances across underlying hardware (up to 7 instances per group per AZ)
    • Partition: spreads instances across partitions (up to 7 partitions & 100s of instances per group per AZ)
  • Move an instance into or out of a placement group
    • Stop the instance
    • Modify the instance placement group
    • Start the instance

Instances Launch Types

  • On Demand Instances
  • Spot Instances
  • Reserved Instances
    • Reserved Instances
    • Convertible Reserved Instances
    • Scheduled Reserved Instances
  • Dedicated Instances
  • Dedicated Hosts
    • Book an entire physical server
    • Can define host affinity so that instance reboots are kept on the same host

Spot Instances

  • Define max spot price, and you will get the instance while current spot instances price are less the the price
  • You can also buy a Spot Block, that the instances will not be interrupted during a specified time frame (1 to 6 hours)
  • Spot Fleets
    • Fleets of Spot Instances and optionally On-Demand Instances
    • Can define max price for each Spot Instance
    • Can have a mix of instance types
    • Support for EC2, ASG, ECS with ASG, AWS Batch

Metrics

  • CPU Utilization
  • Network In / Out
  • Disk Read / Write for instance store
  • Status Check
    • Instance status (VM)
    • System status (underlying hardware)
  • RAM is not a EC2 metric, need to push it to CloudWatch as a custom metric

Instance Recovery

  • CloudWatch Alarm can monitor EC2 instances status check
  • If any check failed, it can auto recovery the instance
  • Preserve same private / public / Elastic IP, metadata, placement group

Auto Scaling

About Auto Scaling

  • Spot Fleets support
  • Update AMI
    • Update launch configuration / template
    • Terminate instances manually
    • You can use CloudFormation to automate the steps
  • Scheduled scaling actions
    • Increase or decrease instances in a schedule
  • Lifecycle Hooks
    • Perform actions before launch or terminate a instance

Scaling Policies

  • Simple Scaling (Step Scaling): increase or decrease instances based on 2 CloudWatch alarms
  • Target Tracking: smart adjust instances to close to a metric

Scaling Processes

  • Launch
  • Terminate
  • HealthCheck
  • ReplaceUnhealthy
  • AZRebalance
  • AlarmNotification: when accept notification from CloudWatch
  • ScheduledActions
  • AddToLoadBalancer
  • You can suspend above processes

Health Checks

  • Check types
    • EC2 Status Checks
    • ELB Health Checks (HTTP)
  • ASG will launch a new instance to replace the unhealthy one

Architecture: Update instances in ASG

  • Solution 1: Create new instances with the new template in the same ASG, and terminate old instances later
  • Solution 2: Create a new ASG connected to the same load balancer, and split traffic between ASGs
  • Solution 3: Create a new load balancer, use Route 53 CNAME weight record to split traffic (has risks because it’s based on DNS)

ECS

  • Classic ECS: Running ECS on user-provisioned EC2 instances
  • Fargate: Running ECS tasks on AWS managed compute (serverless)
  • EKS: ECS for Kubernetes
  • ECR: Docker Container Registry hosted by AWS

ECS Concepts

  • Cluster
  • Service
  • Tasks & task definition: containers running to create the applications

ALB Port Mapping

  • Allows you to run multiple instances of the same application on the same EC2 instance

ECS Security & Networking

  • IAM security
    • EC2 Instance Role must have basic ECS permissions
    • ECS Task level should have an IAM Task Role
  • Integration with SSM Parameter Store & Secrets Manager
  • Tasks networking
    • none: no network, no port mappings
    • bridge: use Docker’s virtual network
    • host: use the underlying host network interface
    • awsvpc
      • A task has its own ENI and a private address
      • Enhanced security with Security Groups, monitoring, and VPC flow logs
      • Default mode for Fargate

ECS Auto Scaling

  • CPU and RAM is tracked in CloudWatch at ECS service level
  • ESC Service Auto Scaling cannot auto scale EC2 instances
  • Fargate Auto Scaling is easier to use (serverless)

AWS Lambda

Lambda Runtime

  • Node.js
  • Python
  • Ruby
  • Java
  • Golang
  • C#
  • Powershell

Lambda Limits

  • RAM: 128 MB to 3G
  • Timeout: 15 minutes
  • Storage: up to 512 MB
  • Deployment package: up to 250 MB
  • Concurrency: 1000 (soft limit)
  • Latencies
    • Cold Invocation: ~100 ms
    • Warm Invocation: ~ms
    • Use Provisioned Concurrency to keep some functions warm
    • Use X-Ray to trace end-to-end latency

Lambda Security

  • IAM Roles
  • Lambda Resource-based Policies

Lambda in VPC

  • Lambda can be deployed in VPC
  • Connect internet from private subnet
    • Use NAT and IGW
    • Use Endpoints
  • CloudWatch do not need NAT or Endpoint

Lambda Logging

  • CloudWatch
    • CloudWatch Logs
    • CloudWatch Metrics display Lambda metrics
    • Make sure Lambda has correct IAM role to write logs to CloudWatch Logs
  • X-Ray
    • Trace Lambda
    • Need enable in Lambda configuration or use AWS SDK in Code

Lambda Invocation Types

  • Synchronous Invocations
    • Results is returned right away
    • Client need handle the errors
  • Asynchronous Invocations
    • Will be used when triggered by S3 Events, SNS, CloudWatch Events
    • Lambda attempts to retry on errors (up to 3 times)
    • Make sure the processing is idempotent
    • Can define a DLQ (to SNS or SQS) for failed processing
  • Event Source Mapping
    • Pull batches from stream sources
      • Kinesis Data Streams
      • SQS
      • DynamoDB Streams
    • If your function returns an error, the entire batch is reprocessed until success

Lambda Destinations

  • Can configure to send result to a destination
  • In asynchronous invocations, can define destinations for successful and failed event to
    • SQS
    • SNS
    • Lambda
    • EventBridge bus
  • In Event Source Mapping, can send discarded event batches to
    • SQS
    • SNS
  • AWS recommends you use destinations instead of DLQ

Lambda Versions & Aliases

  • Versions
    • Versions have increasing version numbers
    • Versions are immutable, Any changes to a function will publish new versions
    • Latest version is marked as $LATEST, which is default version
    • Versions get their own ARN
  • Aliases
    • An alias point to a version
    • Aliases are mutable and can be defined by user
    • Use aliases to do Blue / Green deployment
      • CodeDeploy can help to automate traffic shift
      • Linear: grow traffic every N minutes until 100%
      • Canary: try at x% then jump to 100%
      • AllAtOnce: immediate
    • Aliases get their own ARN

ELB

ELB Types

  • Classic Load Balancer (v1)
    • HTTP(S), TCP
  • Application Load Balancer (v2)
    • HTTP(S), WebSocket
  • Network Load Balancer (v2)
    • TCP(TLS), UDP
  • AWS recommends v2 as they provide more features
  • ELB can be internal (with a private IP) or external (with a public IP)

Classic Load Balancers

  • CLB Listener can be
    • HTTP(s) (Layer 7)
    • TCP(TLS) (Layer 4)
    • Internal traffic must be at the same layer
  • CLB supports only one SSL certificate
    • You must edit SAN (Subject Alternate Name) in SSL certificate to support multiple host names
    • Also, you can use multiple CLBs or use ALB with SNI (Server Name Indication)

Application Load Balancers

  • ALB is layer 7
  • ALB can balance traffic to
    • Target groups
      • EC2 instances
      • ECS tasks
      • Lambda functions
      • Private IP address
    • Multiple applications using Dynamic Port Mapping
  • Support redirects from HTTP to HTTPS
  • ALB supports routing HTTP requests based on URL
  • ALB supports SNI

Network Load Balancers

  • NLB is layer 4
  • NLB has a static public IP per AZ and supports assigning Elastic IP
  • NLB has less latency 100 ms (400 ms for ALB)
  • NLB target groups
    • EC2 instances
    • ECS tasks
    • Private IP addresses
  • Proxy Protocol
    • NLB can append a proxy protocol header to the TCP data
    • So you can send additional connection information such as the source and destination

Cross-zone Load Balancing

  • CLB
    • Disable by default
    • No charges
  • ALB
    • Always enabled
    • No charges
  • NLB
    • Disable by default
    • Charged for cross-AZ data transfer if enabled

Load Balancing Stickiness

  • CLB & ALB support stickiness
  • The same client is always redirected to the same instance
  • Stickiness may cause imbalance
  • Alternative is to cache session data in ElastiCache or DynamoDB

API Gateway

  • Helps expose Lambda, HTTP, AWS Services as REST APIs
  • Limits
    • 29 seconds timeout
    • 10 MB max payload size

Deployment Stages

  • You can create many stages of API Gateway, and name them as you want
    • such as dev, test, prod
  • Stages can be rolled back

Architecture: API Gateway in front of S3

  • API Gateway has 10 MB payload size limit, so proxy S3 through API Gateway is not perfect
  • Use API Gateway to invoke a Lambda function to generate a pre-signed URL from S3, and send the URL back to the client

Endpoint Types

  • Edge-Optimized (default)
    • Requests are routed through the CloudFront Edge locations
    • The API Gateway still lives in one region
  • Regional
    • Lower latency if client is in the same region
    • Can manually combine with CloudFront
  • Private
    • Can only be accessed from the VPC using VPC Endpoints
    • Need use resource-based policies to define access

Gateway Cache

  • Helps to reduce calls made to the backend
  • Default TTL is 300 seconds (up to 3600s)
  • Caches are defined per Stage
  • You can overwrite cache settings of each methods
  • Clients can control the cache TTL with header Cache-Control: max-age=xxx
    • Need proper IAM role
  • You can flush the cache immediately
  • You can encrypt cache optionally
  • Cache capacity is between 521 MB to 237 GB

API Gateway HTTP Error Codes

  • 4xx
  • 5xx
    • 29 seconds timeout will cause 504

API Gateway Security

  • API Gateway can load certificates
  • Resource-based policy
  • IAM roles at the API level
  • CORS
    • Control which domains can call your API

API Gateway Authentication

  • IAM based access
    • Good for providing access within your own network
    • Pass IAM credentials in headers through Sig V4
  • Custom Authorizer (normally use Lambda)
    • OAuth, SAML, etc.
  • Cognito User Pools
    • Client authenticates with Cognito

API Gateway Logging & Monitoring

  • CloudWatch Logs
    • Enable CloudWatch logging at the Stage level
    • Can send custom logs
    • Can send logs directly into Kinesis Data Firehose
  • CloudWatch Metric
    • Metrics are by stage
      • Latency and Cache Hits
    • Can enable detailed metrics
  • X-Ray
    • Tracing requests to get extra information
    • You can get the full picture if you integrate API Gateway with Lambda

Route 53

Route 53 Records

  • Types
    • A: hostname to IPv4
    • AAAA: hostname to IPv6
    • CNAME: hostname to hostname
    • Alias: hostname to AWS resource
      • Can also be used for root apex record (omit www)
  • Records has TTL

Routing Policy

Simple Routing Policy

  • Maps a hostname to a single resource
  • No health check
  • Can return multiple resources, but client will choose one randomly
    • Called Multi-value Routing Policy

Weight Routing Policy

  • Maps a hostname to multiple resources with weight
  • Can do load balancing
  • Can associate with health checks

Failover Routing Policy

  • If primary resource fails in health check, Route 53 will failover to the secondary one

Latency Routing Policy

  • Redirect to the resource that has the least latency close to the client
  • Has a failover capability if enable health checks

Geo Location Routing Policy

  • Routing based on user location
    • Different from Latency Routing Policy
  • Should create a default policy in case of no matching

Nested Records

  • Use Alias Records (to a Route 53 record) to nest Routing policies

Private DNS

  • Can use Route 53 as internal private DNS in VPC(s)
  • Must enable the VPC settings enableDnsHostNames and enableDnsSupport
  • The VPC(s) called Private Hosted Zone

Route 53 DNSSEC

  • Route 53 supports DNSSEC for domain registration
  • Route 53 dose not provide DENSEC service
    • You need another DNS provider or custom DNS server on EC2

Route 53 Health Checks

  • Health Check Targets
    • End points
    • CloudWatch Alarms
    • Other Health Checks (calculated Health Check)
  • Health Check Metrics
    • Response (only 2xx and 3xx can pass health checks)
    • Response body (first 5120 bytes)
    • Other Health Checks results
  • Health Checks can trigger CloudWatch Alarms

Architecture: Health Check with Private Subnet

  • Health Checks cannot access private end points
  • Solution:
    • Create a CloudWatch Metric for the private end point
    • Create a CloudWatch Alarm associated with the metric
    • Create a Health Check checks the alarm

Architecture: RDS Multi-region Failover

  • A RDS DB with a RDS Read Replica in another region
  • Solution:
    • Create a CloudWatch Alarm for the main DB
    • Create a Health Check checks the alarm
    • Trigger a CloudWatch Alarm if check fails
    • Then trigger a CloudWatch Event or SNS topic
    • Then trigger a Lambda function
      • Update Route 53 DNS records
      • Promote Read Replica to be the main DB

Architecture: Add a VPC to Private Hosted Zone

  • Set VPC peering between the new VPC and central VPC
  • Associate the new VPC
    • If the VPCs are in different accounts, association can only be done through CLI